32 research outputs found
Contributions to the Computational Treatment of Non-literal Language
A thesis submitted in partial ful lment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Non-literal language concerns the deliberate use of language in such a way that meaning cannot be inferred through a mere literal interpretation. In this thesis, three different forms of this phenomenon are studied; namely, irony, non-compositional Multiword Expressions (MWEs), and metaphor.
We start by developing models to identify ironic comments in the context of the social micro-blogging website Twitter. In these experiments, we proposed a new way to extract features based on a study of their spatial structure. The proposed model is shown to perform competitively on a standard Twitter dataset.
Next, we extensively study MWEs, which are the central point of focus in this work. We start by framing the task of MWE identi fication as sequence labelling and devise experiments to see the effect of eye-tracking data in capturing formulaic MWEs using structured prediction.
We also develop a novel neural architecture to speci fically address the issue of discontinuous MWEs using a combination of Graph Convolutional Neural Networks (GCNs) and self-attention. The proposed model is subsequently tested on several languages where it is shown to outperform the state-of-the-art in overall criteria and also in capturing gappy MWEs.
In the final part of the thesis, we look at metaphor and its interaction with verbal MWEs. In a series of experiments, we propose a hybrid BERT-based model augmented with a novel variation of GCN where we perform classifi cation on two standard metaphor datasets using information from MWEs. This model which performs at the same level with state-of-the-art is, to the best of our knowledge, the first MWE-aware metaphor identifi cation system paving the way for further experimentation on the interaction of different types of fi gurative language.Research Group in Computational Linguistics
Cross-lingual transfer learning and multitask learning for capturing multiword expressions
This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119
The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches
GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks
This paper describes the system submitted to the SemEval 2019 shared task 1 āCross-lingual Semantic Parsing with UCCAā. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing
WLV at SemEval-2018 task 3: Dissecting tweets in search of irony
International Workshop on Semantic Evaluation. WLV at SemEval-2018 Task 3.This paper describes the systems submitted to
SemEval 2018 Task 3 āIrony detection in English
tweetsā for both subtasks A and B. The
first system leveraging a combination of sentiment,
distributional semantic, and text surface
features is ranked third among 44 teams according
to the official leaderboard of the subtask
A. The second system with slightly different
representation of the features ranked ninth
in subtask B. We present a method that entails
decomposing tweets into separate parts.
Searching for contrast within the constituents
of a tweet is an integral part of our system.
We embrace an extensive definition of contrast
which leads to a vast coverage in detecting
ironic content.Research Group in Computational Linguistic
Combining Multiple Corpora for Readability Assessment for People with Cognitive Disabilities
The 12th Workshop on Innovative Use of NLP for Building Educational Applications, 8th September 2017 Copenhagen, Denmark.Given the lack of large user-evaluated corpora
in disability-related NLP research
(e.g. text simplification or readability assessment
for people with cognitive disabilities),
the question of choosing suitable
training data for NLP models is not
straightforward. The use of large generic
corpora may be problematic because such
data may not reflect the needs of the target
population. At the same time, the available
user-evaluated corpora are not large
enough to be used as training data. In
this paper we explore a third approach, in
which a large generic corpus is combined
with a smaller population-specific corpus
to train a classifier which is evaluated using
two sets of unseen user-evaluated data.
One of these sets, the ASD Comprehension
corpus, is developed for the purposes
of this study and made freely available.
We explore the effects of the size and type
of the training data used on the performance
of the classifiers, and the effects of
the type of the unseen test datasets on the
classification performance
Using gaze data to predict multiword expressions
In recent years gaze data has been increasingly used to improve and evaluate NLP
models due to the fact that it carries information about the cognitive processing
of linguistic phenomena. In this paper we
conduct a preliminary study towards the
automatic identification of multiword expressions based on gaze features from native and non-native speakers of English.
We report comparisons between a part-ofspeech (POS) and frequency baseline to:
i) a prediction model based solely on gaze
data and ii) a combined model of gaze
data, POS and frequency. In spite of the
challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze
data (from native versus non-native speakers) affects the prediction, showing that
data from the two groups is discriminative
to an equal degree. Finally, we show that
late processing measures are more predictive than early ones, which is in line with
previous research on idioms and other formulaic structures.Na
On the Effectiveness of Compact Biomedical Transformers
Language models pre-trained on biomedical corpora, such as BioBERT, have
recently shown promising results on downstream biomedical tasks. Many existing
pre-trained models, on the other hand, are resource-intensive and
computationally heavy owing to factors such as embedding size, hidden
dimension, and number of layers. The natural language processing (NLP)
community has developed numerous strategies to compress these models utilising
techniques such as pruning, quantisation, and knowledge distillation, resulting
in models that are considerably faster, smaller, and subsequently easier to use
in practice. By the same token, in this paper we introduce six lightweight
models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT,
TinyBioBERT, and CompactBioBERT which are obtained either by knowledge
distillation from a biomedical teacher or continual learning on the Pubmed
dataset via the Masked Language Modelling (MLM) objective. We evaluate all of
our models on three biomedical tasks and compare them with BioBERT-v1.1 to
create efficient lightweight models that perform on par with their larger
counterparts. All the models will be publicly available on our Huggingface
profile at https://huggingface.co/nlpie and the codes used to run the
experiments will be available at
https://github.com/nlpie-research/Compact-Biomedical-Transformers
Cognitive processing of multiword expressions in native and non-native speakers of English: evidence from gaze data
Gaze data has been used to investigate the cognitive processing of certain types of formulaic language such as idioms and binominal phrases, however, very little is known about the online cognitive processing of multiword expressions. In this paper we use gaze features to compare the processing of verb - particle and verb - noun multiword expressions to control phrases of the same part-of-speech pattern. We also compare the gaze data for certain components of these expressions and the control phrases in order to find out whether these components are processed differently from the whole units. We provide results for both native and non-native speakers of English and we analyse the importance of the various gaze features for the purpose of this study. We discuss our findings in light of the E-Z model of reading
Continuous patient state attention models
Irregular time-series (ITS) are prevalent in the electronic health records (EHR) as the data is recorded in EHR system as per the clinical guidelines/requirements but not for research and also depends on the patient health status. ITS present challenges in training of machine learning algorithms, which are mostly built on assumption of coherent fixed dimensional feature space. In this paper, we propose a computationally efficient variant of the transformer based on the idea of cross-attention, called Perceiver, for time-series in healthcare. We further develop continuous patient state attention models, using the Perceiver and the transformer to deal with ITS in EHR. The continuous patient state models utilise neural ordinary differential equations to learn the patient health dynamics, i.e., patient health trajectory from the observed irregular time-steps, which enables them to sample any number of time-steps at any time. The performance of the proposed models is evaluated on in-hospital-mortality prediction task on Physionet-2012 challenge and MIMIC-III datasets. The Perceiver model significantly outperforms the baselines and reduces the computational complexity, as compared with the transformer model, without significant loss of performance. The carefully designed experiments to study irregularity in healthcare also show that the continuous patient state models outperform the baselines. The code is publicly released and verified at https://codeocean.com/capsule/4587224